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Abstract 

The paramount importance of replicating associations is well recognized in the 
genome-wide associaton (GWA) research community, yet methods for assessing 
replicability of associations are scarce. Published GWA studies often combine 
separately the results of primary studies and of the follow-up studies. Informally, 
reporting the two separate meta-analyses, that of the primary studies and follow-up 
studies, gives a sense of the replicability of the results. We suggest a formal em- 
pirical Bayes approach for discovering whether results have been replicated across 
studies, in which we estimate the optimal rejection region for discovering repli- 
cated results. We demonstrate, using realistic simulations, that the average false 
discovery proportion of our method remains small, while the power is much greater 
than that of a frequentist approach. We apply our method to six type two diabetes 
(T2D) GWA studies. Out of 803 SNPs discovered to be associated with T2D using 
a typical meta-analysis, we discovered 219 SNPs with replicated associations with 
T2D. We recommend complementing a meta-analysis with a replicability analysis 
for GWA studies. 

1 Introduction 

The aim of a genome-wide association (GWA) study is to identify genetic variants 
that are associated with a given phenotype. GWA studies of the same phenotype can 
be combined in order to increase detection power to establish association. For exam- 
ple, |\bightetaL (2010) discover in a meta-analysis single nucleotide polymorphisms 



(SNPs) associated with type 2 diabetes (T2D) that were not discovered in single stud 
ies. 

The paramount impor tance of replicat i ng ass ociat ions has been well-r e cognized in 



the GWAS literature, e.g. McCarthy et a l. (2008) and|NCI-NHGRI (2007). iKraft et al 
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(2009) note that for common variants, the anticipated effects are modest and very sim- 
ilar in magnitude to the subtle biases that may affect genetic association studies - most 
notably population stratification bias. For this reason, they argue that it is important to 
see the association in other studies conducted using a similar, but not identical, study 
base. 

Typical meta-analysis aims to discover the associations that are present in at least 
one study, not replicated associations. Replicability analysis aims to discover replicated 
associations, i.e. associations between SNP and phenotype that are present in more 
than one of the studies. Meta-analysis methods are not appropriate for discovering 
replicated associations. To see this, consider the scenario where for testing the null 
hypothesis that a SNP is independent of the phenotype, the p-value is extremely small 
in one study, but not small at all in the other studies. Still, the meta-analysis will result 
in a small combined p-value. In this scenario, there is evidence of association of this 
SNP with the phenotype but there is no evidence that this association is replicated. 
Therefore, a small p-value in a typical meta-analysis is evidence towards association 
of the SNP with the phenotype in at least one study, but it is not evidence that the 
association has been replicated in more than one study. 

Many meth ods exist for meta-analys i s, where follow-up studies simply serve to 



add power. See Hedges and Olkin ( 1985), Beniamini and Yekutieli (2005), Skol et al 
(2006), and IZeggini et alj (12007ft . among others. Howe ver, only a handful of meth- 
ods have been suggested so far for replicability analysis. Beniamini et al. (2009) sug- 
gests applying the Benjamini-Hochberg procedure dBeniamini and Hochbergl 1 1995b . 
henceforth referred to as the B H procedure, on partial conjunction hypotheses p-values 



Bogomolov and Helled ( 20121) focus on replicability analysis for two studies, and sug- 



gest an alternative false discovery rate (FDR) controlling procedure for this setting. 
In this work, we suggest an empirical Bayes approach to replicability analysis . This 



approa ch may be viewed as an extension of the empirical Bayes approach of lEfron 



(2008). We estimate the local Bayes FDRs under the various configurations of as- 
sociation status of SNP with phenotype in every study, and then sum up the relevant 
probabilities in order to declare associations as replicated as long as they are below the 
estimated Bayes FDR cut-off. 

Although the proposed approach is a general approach for assessing replicability 
in several studies when each study examines the same hypotheses, we discuss it in the 
context of GWA studies. In Section|2]we introd uce the notation, and re view the formal 
frequentist method for replicability analysis of Beniamini et al ] d2009l) as well as the 
empirical Bayes approach to multiple testing of lEfron ( 20081) . In Section[5]we present 
the empirical Bayes approach to replciability analysis. In Section|4]we use simulations 
to evaluate the performance of our method. Specifically, in realistic simulations we 
show that the average false discovery proportion (FDP ) of our method rem ains small, 
while the power is much greater than the method of lBeniamini et al. (2009). In Section 
|5]we apply our method to the T2D GWA studies that motivated this work. 
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2 Notation, Goal, and Review 



Suppose there are n independent studies, and in each study M SNPs are measured. 
Let Hij be the indicator of whether SNP j in study i is positively, negatively, or not 
associated with the phenotype: 

{1 if SNP j is positively associated with the phenotype in study i, 
if SNP j is not associated with the phenotype in study i, 
— 1 if SNP j is negatively associated with the phenotype in study i. 

If the SNP is null in study i, i.e. it is not associated with the phenotype, then the 
z-score has a standard normal density fo(z). Otherwise, the z-score of study i has 
density fi,\(z) or depending on whether the SNP is positively or negatively 

associated with the phenotype. Formally, let Zij be the z-score of SNP j in study i. 
Then the conditional density of given is 

r / M (z) it //,, i. 

f{z\H ij )=\ fo(z) if fly =0, 

[ fi,-l(z) if = -1. 

Let H = {h = (hi, . . . , h n ) : hi G {—1,0, 1}} be the set of 3™ possible configura- 
tions of the vector of association status (of SNP with phenotype) in the n studies. Null 
hypotheses for the n studies correspond to subsets of H through the relation that H° is 
true for SNP j if and only if Hj — (H\j, ■ ■ ■ , H n j) is in subset H°. In this work we 
consider two null hypotheses: the no association null hypothesis H^ A that the SNP is 
not associated with the phenotype in any of the studies that corresponds to 

H% A = {(0,0,--- ,0)}; 

the no replicability null hypothesis H^ R that the SNP is positively and negatively 
associated with the phenotype in at most one study that corresponds to 

H° NR = {h : \ht = -1| < 1 R \ht = 1| < 1}. 

Our goal is to discover as many SNPs as possible with false H^ R , i.e. as many 
SNPs from the index set {j : Hj e H/W% R }. This goal is distinct from the goal in a 
typical meta-analysis, of discovering as many SNPs as possible with false H^ A . For 
example, for n = 3 studies, H contains 3 3 = 27 configurations, H% A = {(0,0,0)}, 
and 

H% R = {(0,0,0), (1,0,0), (0, 1,0), (0,0, 1), (-1,0,0), (0, -1,0), (0,0, -1), 
(1,-1,0), (-1,1,0), (-1,0,1), (1,0,-1), (0,1,-1), (0,-1,1)}. 



2.1 The partial conjunction approach to replicability analysis 

In lBenjamini et al.l d2009l) the partial conjunction approach jBeniamini and Hellei , 2008) 
has been suggested for replicability analysis when n > 2 studies are available. The 
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recipe suggested in iBeniamini and Hellerl (|2008) for computing p-values for the no 



replicability null hypotheses was as follows. First, for every subset of n — 1 stud- 
ies, a meta-analysis p-value was computed. Then, the p-value for the no replicability 
null hypothesis was set to be the maximum of the n meta-analysis p-values. Since 
we considered in this work a concordant version of replicability, where the associa- 
tion was considered replicated only if it was present in at least two studies in the same 
direction, the p-value was taken to be twice the smaller of the left- and righ t-sided 
combined p-values using the method of Fisher, as suggested in lOwenl d2009l) . This 



p-values is conservative, i.e. its distribution under the null hypothesis of no replicabil- 
ity is stochastically larger than the uniform, making it practically difficult to discover 
replicated associations. The BH procedure was applied to the p-values of the M no 
replicability null hypotheses. 

2.2 The empirical Bayes approach to multiple testing 

The two group m odel provides a simple Bayesian framework for multiple testing, see 
e.g. Chapter 2 in lEfro iil (l2010l) . Each SNP in study i has marginal probability 7To(«) 



of not being associated with the phenotype, i.e. Pr(Hij = 0) = 7To(i). Conditional 
on Hij ~ 0, the SNP has a standard normal density, fo(z). Unconditionally, the 
marginal (mixture) density is fi(z). Since in this section we only examine one study, 
for notational convenience let ttq = and f(z) = fi(z). For a subset Z of 5R, let 
P {Z) = j z f (z)dz and P(Z) = j z f(z)dz. 

Suppose we observe zij G Z and wonder whether ify = 0. A direct application of 
Bayes rule yields 

Fdr(Z) = Pr{E ld = 0|*y e Z) = tt P q (Z)/P(Z). 

Adopting the terminology in lEfronl d2010h . we call Fdr(Z) the Bayes FDR for Z: if 
we report z%j G Z as non-null, i.e. if we report Hij ^ 0, then Fdr(Z) is the chance 
that we have made a false discovery, i.e that Hij = 0. If Z is a single point zq, then the 
local Bayes FDR is 

fdr(zo) = Pr(Hij = 0\z = z ) = ir f (z )/ f(z ). 



Fdr(Z) is the conditional expectation of fdr(z) given z € Z ( Efr on and Tibshirani , 

20021) . 



, > Kofo(z)dz f., fdr(z) f(z)dz , , Nl 

Fdr(Z) = Jz ^ ' = ill ^ » K ' = E f (fdr(z)\z £ Z), (1) 



Similar to lStorev (2007) and I Sun and Call d2007l) . we observe in Equation (Q]i that 



among all possible rejection regions Z constrained to satisfy that Fdr(Z) < q, the 
region with maximal probability will be of the form 

Zor = {z : fdr(z) < t(q)}. (2) 
The result is stated formally in the following proposition. 
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Proposition 2.1. Assume the two group model holds for the z-scores. Let t(q) in 
expression (0 be such that Fdr(Zon) = q- For any Z satisfying Fdr(Z) < q, 

P{Z) < P(Z r). 

See the proof inlAl 

In the two group model, tto and P (Z) are neede d in order to compute the Bayes 
FDR. The empirical Bayes approach (lEfronl 1201 01) estimates P(Z) by the empiri- 
cal distribution, which is the fraction of z-s c ores i n Z. Viewing ttq as the fraction 
of true null hypotheses in the study, lEfron (2010) in the R package locfdr, avail- 



able on CRAN, suggests estimating ttq using the central 50% of the null distribu- 
tion: 7To = ^• z ' : ' e ^ — |°'xo^5 $ — < ' ' 7j ^^ , where is the inverse of the standard 
normal cumulative distribut i on function. Other me thods for estimat i ng 7Tn include 
IStorey and Tibshiranil d2003l) . Beniamin i et alJ (120061) , and Jin and Cai ( 20071) . 



When the rejection region is based on the tails of the z-scores, which are equivalent 
to one-sided p-values, there is a strong connection between empirical Bayes estimation 
of the Bayes FDR and the frequentist BH p rocedure for FDR control, as noted by 
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Efro n and Tibs hirani ( 2002) and IS tore vl ( 120021) . For the jth p- value in study i, let z 



$ 1 (pij). Then the BH adjusted p-value is 

Fdr BH (Z :j ) = min ) , Zj = {z ik : z ik < z^}. (3) 

The BH rule rejects all hypotheses with z-scores with Fdr bh {Zj) < Q- Since \Zf\jM 
is the empirical distribution of P(Zj), then if we set ttq conservatively to be one, the 
BH procedure coincides with the procedure that chooses the largest Zj so that the 
estimated Fdr(Zj) is at most q. 

If the local Bayes FDR can be estimated, then for a rejection region Z, equation 
(Q]l shows that Fdr(Z) may be estimated by 

where fdr(zij) is the estimated local Bayes FDR of z-score Zij, and \{j : Zij 6 Z}\ 
is the number of z-scores in Z. Therefore, an alternative to the BH procedure is to 
define the estimated optimal rejection region (ffjl, Z = {z^ : fdr(zij) < t(q)}, and 

choose the largest t(q) so that Fdr(Z) is at most q. This procedure coincides with 
the BH procedure only if the estimated local Bayes FDR is a monotone function of the 
z-scores. 

The R package locfdr can be used for estimating the local Bayes FDR and the 
Bayes FDR, see details in lEfron ( 2010l) . In the next Section [3] we shall generalize the 



two group model to a multi-group model, and show how the corresponding generalized 
Bayes FDR can be estimated. 
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3 The empirical Bayes approach to replicability analy- 
sis 



3.1 Generalization of the two group model 

Each SNP has probability w(h) of having association configuration h, i.e. Pr(-ffj = 
h) = tt(Ii). Conditional on Hj = h, the vector of n z-scores Zj = (zij, ■ ■ ■ , z n f) 
has density f(zj\h) = YYi=i smce tne z-scores are independent across stud- 

ies conditional on the association status. Note that ttq({) is equal to the sum of the 
probabilities tt(Ii) over all configurations h £ "H with hi = 0. 

Suppose we observe Zj for SNP j and wonder whether Hj 6 H°. A direct appli- 
cation of Bayes rule yields the local Bayes FDR 

fdrwft) = Pv{H 3 e H°\zj) = £ ]£)//&•), (4) 

hen 



where f(zj) — J^heu 7r (' l )/(%l^) ' s the mixture density. The local Bayes FDR for 
SNP j for null hypothesis H^ A and H^ B , respectively, is 

fdr n o (z 3 ) = Pr(H, e n° NA \z J ) and fdr n o (z,) = Pr(H, £ H° NR \zj). 



For a subset Z of 5ft", if we report for zj £ Z that Hj £ H , then the probability 
that H 3 £ n° is the Bayes FDR, Fdr n a (Z). As inQ] 

Fdr H o (Z) = Pr(H 3 £ H° \ z 3 e Z) = E f (fdr n o (zj)\zj £ Z). (5) 



Following the argument in Section 12.21 the optimal rejection region to discover 
SNPs that are non-null, i.e. Hj ^ T-L°, in the sense that this rejection region has 
maximal probability among all possible regions that are constrained to have a Bayes 
FDR of at most level q, is 

Zor,h" = {z: fdr n o(z) < t(q)}, (6) 

where t{q) is such that Fdr n o (Z O r,u ) = 1- 

Example 3.1. For n = 2 studies, suppose the marginal z-score density in the two 
studies is N(0, 1) under the no-association null hypothesis, and under the alternative 
positive association hypotheses the z-score density is N(p,i, 1) in the first study and 
N(/j, 2 , 1) in the second study. Thus the joint z-score density is 

f(z u z 2 ) = tt(0, 0)000(^2) +7r(O,l)0(«i)^(«a-/*a) 

+ 7r(l, 0)(f>(z 1 - fj,i)(f>(z 2 ) + 7r(l, 1)0(zi - (J,i)<f>(z2 - jU 2 ), 

for 4>(z) the standard Normal density. For h £ {(0, 0), (0, 1), (1, 0), (1, 1)}, the condi- 
tional probability that H — [hi, h 2 ) given (zi, z 2 ) is 

- 7r(/t)(/)(zi - J(fei = l)/ii)^2 - I(h 2 = l)/i 2 ) 

Pr(/i (,Zl,ZlJJ - — r , 
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Figure 1 : Optimal (solid curves) and p-value based (dotted-dashed curves) rejection 
regions boundaries for bayes FDR levels q E {0.20, 0.05, 0.01} for /ii = 3 and [1% = 3 
of the test of H^ A (left) and H^ R (middle) , and for /«i = and /12 = 3 for the test of 
H^ A (right). The further the boundary is from (0,0) the smaller the value of q. 

where 1(f) is the indicator function. The local fdr for testing H^ A and H^ R , respec- 
tively, is fdrjs[A{z\i Z2) = P r ((0, 0)| (z\, zf)) and 

fdr NR ( Zl ,z 2 ) =Pr{{0,0)\ (*i,2i))+Pr((0,l)| [z u zi)) + Pr((l, 0)| (zi.zi)). 

Le?7r((0,0)) = 0.80, vr((0, 1)) = tt((1,0)) = 0.08,, and tt((1, 1)) = 0.04. We 
computed two rejection regions: the optimal rejection region, and a rejection region 
based on p-values. The p-values for H^ A and were, respectively, the p-values 

of the Fisher combined (right-sided) p-values, and the maximum of the two studies p- 
values. For [i\ = = 3, Figure Q] shows the rejection region boundaries for the 
optimal rejection region and the p-values based rejection region for testing H^ A (left) 
and Hpf R (middle). Clearly, the rejection regions are much larger for detecting asso- 
ciations than for detecting replicability. Figure [7J ( right) shows the rejection regions 
for ii\ = and /12 = 3 when testing H^ A , where the difference between the opti- 
mal rejection region and the rejection region based on p-values is larger than when 
H-\ = M2 = 3. Specifically, the optimal rejection region is only determined by the 
z-score of the second study, Z 2 . 

Table Q] shows the probability of the rejection regions for the no association and 
no replicability null hypotheses. The probabilities of the rejection regions to discover 
replicability are much smaller than for discovering associations. Moreover, the proba- 
bilities of the optimal rejections regions are larger than for the p-value based regions, 
and the differences between the probabilities of the regions are larger for configuration 
(pi,V2) = (0,3) than for (^1,^2) = (3,3). 

The following example illustrates the large loss of power due to a non-optimal 
choice of rejection region that can occur when more than two studies are available. 

Example 3.2. Forn = ^studies, let tt((0, 0, 0, 0, 0, 0)) = 0.90 and tt((0, 0, 0, 0, 0, 1)) = 
0.10. Thus the first five z-scores Z\---Z§ are N(0, 1). The sixth z-score Zq is 
N(0, 1) with probability 0.9 and N(3, 1) with probability 0.1. Similar to the setting 
(Mlj M2) = (0, 3) in Example \3.1\ the p-value based rejection region for testing H^ A 
is very different than the optimal rejection region, which is only based on Zq. For a 



7 



Table 1 : The probability of the optimal and of the p- value based rejection regions, for 



various Bayes FDR levels q and two configurations of (^1,^2)- 



q 


(M1.M2) = (3,3) 

NA NR 

Zor p-value based Z Zor p-value based Z 


OiiM2) = (0,3) 

NA 

Zor p-value based Z 


0.20 

0.050 
0.01 


0.2230 0.2182 0.0417 0.0410 
0.1498 0.1417 0.0234 0.0230 
0.0974 0.0900 0.0098 0.0096 


0.1355 0.1178 
0.0855 0.0621 
0.0497 0.0281 



Bayes FDR of q — 0.05, the probability of the optimal rejection region was 0.066, and 
the probability of the p-value based rejection region was 0.012. 

In order to test any null hypothesis H° on the n studies, we need to first estimate 
the local Bayes FDR for the observed z-scores, {fdr^o (z k ) : k = 1, . . . , M}, We use 
these estimates to estimate the Bayes FDR © for every z-score Zj (j = 1, . . . , M): 

where Zj = {z k : fdr Ho (z k ) < fdr Ho (zj), k = 1, . . . , M}. 

Let i(q) be the largest estimated local Bayes FDR satisfying Fdr(Zj) < q. Then, 
our estimate of the optimal rejection region © is {z k : fdr Ua {zk) < i(q),k 

1 M}. We conclude that SNP k is non-null, i.e. H k H , if fdr Ha (z k ) < i(q), 

or equivalently, if Fdr(Z k ) < q. 

In practice, we do not know Tr(h) and the conditional z-score densities that are 
necessary for estimating the local Bayes FDR. In the next Sections |3.2| and [331 we 
discuss, respectively, estimation of the conditional z-score densities and estimation of 



Remark 3.3. In example s \3.1\ and \3.2\ we showed that a p-value based rejection region 
may be far from optimal. The p-value based rejection regions in Figure\l\were de- 
termined according to exact Bayes FDR computations, whereas in practice the p-value 
based rejection regions are specified by the BH adjusted p-values defined in equation\3\ 
resulting in an even more conservative procedure. The rejection region with FdrsH = 
0.05 for testing H^ A actually has Fdr = 0.04 and a rejection probability of '0. 133 , for 
testing Hpf R actually has Fdr = 0.0022 with rejection probability only 0.0028. These 
probabilities are smaller than the probabilities with exact Fdr computation in Table\l\ 
by a factor of 1.06 for H^ A and 8.2 for H^ R . Figure \3~3\ displays Fdr Bjj(Zj) versus 
Fdr(Zj) for H^ A and H^ R in the setting of Example \3.1\ with (ji-y, ^ 2 ) = (3, 3) and 
q = 0.05, for the prior probabilities of Example \3.1\ ( left) and for prior probabilities 
tt((0,0)) = 0.50,tt((0,1)) = tt((1,0)) = 0.23, 7r((l, l)j = 0.04 (right). The figure 
demonstrates three key observations about the difference Fdr bh (Zj ) — Fdr(Zj): (1) 
the difference is larger when testing H^ R than when testing H^ A ; (2) the difference 
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Figure 2: The Bayes FDR value q (solid line), Fdrsn for the test of H^ A (bot- 
tom dash-dot line) and for the test of H^ R (top dash-dot line), for prior probabil- 
ities tt((0, 0)) = 0.80,tt((0, 1)) = tt((1,0)) = 0.08,7r((l,l)) = 0.04 (left) and 
tt((0, 0)) = 0.50, tt((0, 1)) = tt((1, 0)) = 0.23, tt((1, 1)) = 0.04 (right) . 

when testing H^ rA is larger when 7r((0, 0)) is smaller ; (3) the difference when test- 
ing H^ R is larger when 7r((0, 0)) is larger. The reason for (2) is that FdrBii{Zj) is 
overly conservative by a factor of l/7r((0, 0)) when testing H^ A , and the reason for 
(3) is that in the computation of Fdr bh (Zj ) the p-value in the numerator is computed 
conservatively assuming the configuration is (hi, /12) G {(0, 1), (1, 0)}. 

3.2 Estimating the conditional z-score densities 

In order to estimate fi —i(-) we used the R package locfdr ( Efron , 2010b . The 



details of our density estimation for study i G {1, . . . ,n) are as follows. First, the 
z-scores {zij : j = 1, . . . , M} were divided into B bins of equal size. Let 2^1 • • • x^b 
be the centers of these bins. At these centers, the locfdr package gives an estimate of 
the conditional density given that ^ 0, which we denote by fa a, (The density fi t A 
is estimated in locfdr as follows: /j is estimated by Poisson regression with default 
parameters, and fi t A is extracted after estimation of tto as detailed in Section l2~2l ) 
Since f^A is bimodal, with zero probability for bins around zero, we set 

i , v J ifz i>6 <0, . , f ifa; b >0, 

/i,A,i(^i,6) = i ? / s . n and Ji.A~i(Xi.b) = < t , s . t „ 

I JA(Xi,b) lfXi, h >0. [ TA(Xi,b) lf^,f,<0. 

We forego the effort of estimating the z-score density at each z-score value. Instead, 
we assign each z-score into the bin that it is in, denoted by gy G {1, . . . , B}. For 
hj G { — 1,1} and b G {1, . . . , £?}, we estimate the probability of being in bin b by 

7 /r\ fi.A.hi(Xi.b) 
Ji,hi (0) - 



2~2l=l fi,A,hi(%i,l) 
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Conditional on Hij = 0, the density of Zij, denoted by /o(-)> i s standard normal. 
We therefore define 

For SNP j, the vector of n binned z-scores Zj = (zij,. . . , z n j ) given configuration 
Hj has probability 

n 

/&i4-)=n 

i=l 

where S fi,o, fi,i} is the estimated probability distribution of % given 

Hij. 

3.3 Estimating the probabilities of the different association config- 
urations 

The 3™ — 1 probabilities of the multi-group model we want to estimate are ir = {tv(Ii) : 
h € H, J2h<=u n W = 1}- For the n independent studies, the likelihood of the binned 
z-scores for each SNP across the n studies is 

L(n;SjJ) = Pr(%|ff,/) = /(% | ^)tt(H), 

In order to compute the full likelihood we need to know the joint distribution 
of (Hi ■ ■ ■ Hm) as well as for each study i (i = 1, . . . , n) the joint distribution of 
(zn, . . . , zim) given (Ha, . ■ . , HiM- Since the joint distributions is unknown, we 
compute instead the composite likelihood, which is the product of the M SNPs marginal 
likelihoods, 

L CL (7f;zJ) = nfUL(7i;~z,J). 

Although the composite likelihood is different than the full likelihood, in large prob- 
lems with local dependency the maximum lik elihood estimates of t he composite like- 
lihood and the full likelihood are very similar (ICox and Re id. 2004). 

3.3.1 Maximizing the composite likelihood 

We use the expectation maximization (EM) algorithm to find if that maximizes the 
composite likelihood. The observed data are the binned z-scores z\ , . . . , zm and the 
missing values are Hi, ■ ■ - , Hm- The complete likelihood for SNP j is 

L c {Z-,z j J,Hj) = f(zj\Hj)*{Hj). 

The composite complete likelihood for all the SNPs is 

n™L c (Z- ZjJ, Hj) = nMf(zj\Hj)n(Hj). 
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E step In the E step we calculate the expected value of the log composite likelihood 
function, with respect to the conditional distribution of H given z under the current 
estimate of the parameters, 7?w ; 

Q(tt| tt«) = En^llogiUVjizAHs) ■ n(Hj)}] 

M 

= E _E Pr ^ = ^,* W )[log/(%l4- = h) + log{n(h)}} 

J =1 Ken 

M 

= E _E Pr (^ = hWi^logf&lHj = h) 
i =1 hen 

M 

+ E E Pr (^- = ^i%^ (t) )iog{^(M} (8) 



where 



M step Find 7f( t+1 ) that maximizes Q(7?| ifw). Since the second sum in equation ^ 
has the same form as the log-likelihood for the multinomial distribution, it follows that 

7r< ,+i >(K) = :*>•*''> . 

E,r.«E,",P'W = '''l^,* 1 ") 

The updated parameters are 7r( t+1 ) = {i:^ t+ls> {h) : h £ H}. 

4 Simulation studies 

The goal of the simulations was twofold. First, to investigate the effect of the number of 
SNPs M, and the dependence across SNPs, on the empirical Bayes procedure. Second, 
to compare the empirical Bayes procedure to the alternative suggested by the partial 
conjunction approach. In the empirical Bayes analysis, 50 bins were used to estimate 
the alternative densities, and SNPs were considered discovered if the estimated Bayes 
FDR in equation (|7]i was below q = 0.05. In the partial conjunction analysis, the BH 
procedure was applied at the same level q = 0.05. 



4.1 Independence within each study 

We considered n = 3 studies, and M G {1000, 10000, 100000} SNPs in each study. 
Although there were 3" = 27 possible configurations of the vector of associations sta- 
tus, our data generation process had positive probability only for the 15 configurations 
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that do not have a positive and negative association for the same SNP: configuration 
(0, 0, 0) for 90% of the SNPs; the six configurations with exactly one true association, 
i.e. Hj s.t. \Hij \ — 1, each for 1% of the SNPs; the eight configurations with at 

least two true associations in the same direction, i.e. Hj s.t. | Yli=i ^ij I > 2, each for 
0.5% of the SNPs. 



Following Wakefield (2007), we simulated data for every SNP independently with 
disease risk, pij, given by the logistic regression model logit(pij) = a + u6ij, where 
u = 0, 0.5, and 1 corresponds to 0,1 and 2 copies of the mutant allele, respectively. We 
sample Oij given H^ as follows: 

( [/[Q.25,0.5] if = 1, 
O^Hij^l ifffy-0, 

[ U[-0.5, -0.25] if Hij = -1. 

where U(a, b) denotes the uniform distribution between a and b. 

Moreover, the minor allele frequency (MAF) for each SNP j in study i, is sampled 
from U (0.05, 0.50), and we set a = —6, so e a = 0.0025 is the prior odds of a disease 
due to a SNP with 2 = 0. 



Results The simulation results were based on 50 repetitions for M = 100, 000, and 
on 100 repetitions for M = 10,000 and M = 1000. Figure [3] shows the FDP in an 
analysis to discover associations (top) and in a replicability analysis. The variation in 
FDP decreases with M, and is very small for M — 100,000. Table [2] presents the 
average FDP, and number of rejections, R. Although the average FDP of the empirical 
Bayes analysis was below 0.05 for M > 10000, the average FDP when M — 1000 was 
0.068, with a standard error (SE) of 0.006. The empirical Bayes approach makes only 
few more discoveries than the partial conjunction approach in an analysis to discover 
associations, but ten-fold more discoveries in replicability analysis. For example, for 
M = 100000 SNPs the empirical Bayes analysis discovers on average 1161 SNPs 
with replicated associations, while the partial conjunction approach discovers only an 
average of 103.1 SNPs. 

Remark 4.1. Unlike in the setting ofRemark \3.3\ Tablets hows that the average FDP 
for the partial conjunction approach in an analysis to discover associations was much 
lower than 7r(0, 0, 0) x 0.05 = 0.045. For example, for M — 1000 the average FDP 
was 0.0256, with SE of 0.0026. This is due to the discreteness of the distribution of 
the p-values, that were computed from contingency tables. Indeed, when the sample 
size was tripled, the p-values from true no association null hypotheses were closer to 
uniform and therefore the average FDP was closer to the nominal level (not shown). 
However, as in Remark \3.3\ the over-conservativeness of the replicability analysis re- 
mained severe when the sample size was tripled. 



4.2 GWA dependency within each study 

We simulated three GWA studies from the simulator HAPGEN2 dSu et alll201 ll). The 



three studies where generated from three samples of the HapMap project ( The International Ha pMap Consortium, 
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Table 2: The average and SE of FDP, and number of rejections R, in an empirical Bayes 
analysis (columns 3 and 4) and in an analysis using the partial conjunction approach 
(columns 5 and 6), for different values of M. 







Empirical Bayes analysis 


Partial conjunction analysis 


M 


analysis type 


FDP (SE) 


R(SE) 


FDP (SE) 


R(SE) 


100000 


Replicability 


0.0382 (0.0009) 


1161.32 (6.57) 


0.0002 (0.0002) 


103.16(1.96) 


10000 


Replicability 


0.0486 (0.0018) 


121.72(1.42) 


0.0000 (0.0000) 


10.77(0.49) 


1000 


Replicability 


0.0678 (0.0062) 


13.41 (0.46) 


0.0000 (0.0000) 


1.4900 (0.14) 


100000 


Association 


0.0362 (0.0005) 


3336.90 (8.50) 


0.0363 (0.0005) 


3153.38 (9.47) 


10000 


Association 


0.0369 (0.0001) 


337.90(1.80) 


0.0358(0.0011) 


318.04(1.79) 


1000 


Association 


0.0424 (0.0030) 


34.80 (0.54) 


0.0256 (0.0026) 


31.22 (0.56) 



20031) : a sample of 87 individuals with African ancestry in Southwest USA (ASW), a 



sample of 165 Utah residents with Northern and Western European ancestry (CEU), 
and a sample of 109 Chinese in Metropolitan Denver, Colorado (CHD). We limited 
ourselves to chromosomes 1-4, that contained M = 415, 154 SNPs. In these popu- 
lations, the number of causal SNPs was 26 for ASW, 22 for CEU and 27 for CHD. 
Since the effects are typically small for GWA studies, we consider for each population 
four sub-populations, and within each sub-population about 1/4 of the causal SNPs had 
an increased multiplicative relative risk of 1.5. Overall, there were 48 different causal 
SNPs in the four chromosomes, out of which 22 SNPs were causal in more than one 
population. Specifically, the three populations had five causal SNPs in common, and 
in addition, the number of causal SNPs in common in exactly two of the three popu- 
lations was: four for ASW and CEU, seven for ASW and CHD, and six for CEU and 
CHD. Each study contained 8000 cases and 8000 referents from each population. In 
the simulated studies, the linkeage disequilibrium (LD) across SNPs as measured for 
the samples in the HapMap project was retained. 

Due to LD, the number of SNPs associated with the phenotype in every study was 
larger than the number of causal SNPs. Since it is not known from the data generation 
process which SNPs are associated with the phenotype in each study, then for a non- 
causal SNP j we do not know whether H° € {H% A ,H^ R } are false, since non-causal 
SNPs may have false H° due to LD patterns in the different populations. Since a major 
goal in the simulations was to assess whether the FDP is inflated, it was necessary to 
establish a ground truth. We wanted to estimate a conservative ground truth that with 
very high probability estimates a SNP as having a true H° if indeed it is from H°, at 
the possible expense of estimating a SNP as having a true H even if H was false. The 
estimation of the ground truth was as follows. The simulation studies were repeated 20 
times, resulting in 20 p-values per population for every SNP. The 20 p-values were first 
combined with Fisher's combining method, and the partial conjunction approach was 
applied to the combined p-values from the three populations, to form for each SNP a 
combined p-value for H° 6 {H^ A that is based on 20 studies per population. 

H was considered to be false for a SNP if the p-value for testing H was below the 
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Figure 3: Replicability analysis (top) and analysis to discover association (bottom): 
Boxplots of FDP for M=100000, M=10000, and M=1000 for empirical Bayes analysis 
(left) and partial conjunction analysis (right). 

severe Bonferroni threshold for FWER control at level 0.05. The resulting ground truth 
contains 2126 SNPs associated with the phenotype, i.e. with false H^ A , and 695 SNPs 
with replicated association with the phenotype, i.e. with false H^ R . The ground truth 
based on 20 repetitions was very similar to a ground truth that was established based 
on only 19 of the 20 repetitions, and therefore for an analysis of one repetition, the 
resulting FDP using the ground truth based on 20 repetitions was very similar to the 
FDP using the ground truth that results from the 19 repetitions excluding the repetition 
being analyzed. 

Results Table [3] shows the analysis results for the 20 repetitions of the three stud- 
ies. While the average number of rejections was only slightly larger with the empirical 
Bayes analysis than with the partial conjunction approach for testing associations, it 
was more than 20 times larger than when testing for replicated associations. The av- 
erage FDP for the empirical Bayes analysis was larger than for the partial conjunction 
approach, and slightly above the nominal FDR level of 0.05. There are two possible 
explanations for the slightly elevated average FDP: either the ground truth was too con- 
servative, so that rejections that are viewed as false rejections are only counted as such 
due to the conservative estimation of the ground truth, or that the empirical Bayes anal- 
ysis is indeed slightly anti-conservative for the type of dependency that occurs in GWA 
studies. Nevertheless, this simulation demonstrates that the gain in using an empirical 
Bayes analysis over the partial conjunction approach for replicability analysis is great, 
and arguably outweighs the risk that the FDP may be slightly inflated. 
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Table 3: The average number of rejections and FDP of the different analysis methods. 
SE in parentheses based on 20 simulated datasets. 



Method 


mean R (SE) 


mean FDP (SE) 


assoc-conjunction 


242.7 (10.4) 


0.053 (0.005) 


rep-conjunction 


6.4(1.2) 


0.000 (0.000) 


assoc-eBayes 


274.9(12.4) 


0.072 (0.009) 


rep-eBayes 


154.1 (8.5) 


0.065 (0.009) 



5 Replicability analysis of T2D GWA studies 

We received permission to use t he p- values for the f ollowing six studies in the DIAGRAM - 
data used for meta-analysis in lVoight et ail d2010t) : EUROSPAN, DECODE, ERGO, 



DGI, FUSION, and WTCCC . Figure 1 in Appendix IB! shows the empirical z-scores, 
as well as the estimated conditional densities, for each of the six studies, as outputted 
from the locfdr package. Since two of the studies had an estimated fraction of null 
hypotheses of one, these studies were excluded from the empirical Bayes analysis. The 
remaining estimated fractions of null hypotheses were 0.89, 0.98, 0.98, and 0.96, re- 
spectively, for studies DECODE, DGI, FUSION, and WTCCC. For n = 4 studies,the 
sets H, T~L%n contain, respectively 81 and 21 configurations, and H% A contains only 
the configuration (0, 0, 0, 0). 

The empirical Bayes analysis at level q = 0.05 discovered 803 SNPs associated 
with T2D and 219 SNPs with replicated association with T2D. The partial conjunction 
approach at level q = 0.05 based on the four studies, discovered 447 SNPs associated 
with T2D and 83 SNPs with replicated association with T2D, and based on the six 
studies discovered 466 SNPs associated with T2D and 113 SNPs with replicated asso- 
ciation with T2D. A list of the 219 SNPs with replicated associations discovered by the 
empirical Bayes analysis, sorted by positions on the chromosome, is given in Appendix 
IB] SNPs with replicated association included 16 distinct genes. Table|4]shows a subset 
of 17 rows from the Table in Appendix iBl of the SNPs with smallest estimated local 
Bayes FDR in each of these 16 region as well as of the SNP with smallest estimated 
local Bayes FDR among all SNPs that were not in gene regions. The estimated Bayes 
FDR for replicability analysis as well as for the analysis to discover association, and 
the adjusted p-values from the corresponding partial conjunction analysis based on all 
six available studies is given for every SNP. As expected, the estimated Bayes FDR is 
larger for replicability analysis than for an analysis to discover associations, and the 
ranking for replicability is different than for discovering associations. For example,the 
empirical Bayes analysis for KIF1 1 ranks it 7th for evidence of replicability but 5th 
for evidence of association; KCNJ11 is ranked 5th for evidence of replicability but 
8th for evidence of association. While the partial conjunction approach also indicates 
that there is evidence of association in almost all these regions, evidence of replicated 
association is inferred only for five regions. 

The empirical Bayes approach provides for each SNP a measure of belief in each 
possible configuration h conditional on its vector of z-scores. For example, the vector 
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of z-scores for SNP rs7903146 in gene TCF7L2 was z = (-8.8, -4.5, -4.4, -7.5) in 
studies DECODE, DGI, FUSION, and WTCCC, respectively. The estimated posterior 
probability was 0.98 that the configuration was h = (—1, —1, —1, —1), conditional 
on the binned z-score vector. The vector of z-scores for SNP rs 10923931 in gene 
NOTCH2 was z = (-3.4, -4.9, -0.12, -2.8) with estimated posterior probability 
0.92 for configuration h = (—1,-1,0,-1). Table [5] shows the estimated posterior 
probability distributions for these two SNPs. 



Table 4: SNPs with smallest estimated local Bayes FDR in each of the 16 region dis- 
covered by the replicability analysis, as well as of the SNP with smallest estimated 
local Bayes FDR among all SNPs that were not in gene regions. 





chr 


pos 


gene 


rep.bayes.fdr 


assoc. bayes. fdr 


rep.conj.pvadj 


assoc. conj.pvadj 


rs7903146 


10 


114758349 


TCF7L2 


2.40e-ll 


4.61e-22 


0.00e+00 


0.00e+00 


rsl0440833 


6 


20688121 


CDKAL1 


1.60e-05 


8.06e-08 


9.06e-09 


0.00e+00 


rs5015480 


10 


94465559 


$NO.GENE$ 


1.10e-03 


7.74e-05 


8.78e-04 


1.12e-07 


rs4402960 


3 


185511687 


IGF2BP2 


3.14e-03 


6.87e-04 


2.05e-02 


3.51e-05 


rs5215 


11 


17408630 


KCNJ11 


8.91e-03 


4.50e-03 


1.00e+00 


2.36e-02 


rs757110 


11 


17418477 


ABCC8 


9.98e-03 


6.16e-03 


1.00e+00 


2.67e-02 


rs4933734 


10 


94414567 


KIF11 


l.lle-02 


2.96e-04 


1.00e+00 


1.55e-05 


rs 10923931 


1 


120517959 


NOTCH2 


1.34e-02 


2.70e-03 


1.00e+00 


3.45e-04 


rsl 1187033 


10 


94262359 


IDE 


1.89e-02 


2.07e-03 


1.86e-02 


7.07e-06 


rs3 19602 


5 


134222164 


TXNDC15 


2.02e-02 


7.07e-03 


1.00e+00 


3.64e-02 


rs849134 


7 


28196222 


JAZF1 


2.10e-02 


7.80e-03 


9.84e-01 


1.16e-03 


rs6883047 


5 


134272055 


PCBD2 


2.35e-02 


8.55e-03 


1.00e+00 


4.71e-02 


rsl0832778 


11 


17394073 


B7H6 


2.82e-02 


1.64e-02 


1.00e+00 


1.53e-01 


rsl 3070993 


3 


12217797 


SYN2 


3.70e-02 


2.35e-02 


1.00e+00 


3.69e-02 


rsl0433537 


3 


12198485 


SYN2,TIMP4 


3.60e-02 


2.33e-02 


1.00e+00 


3.86e-02 


rs 101 13282 


8 


96038252 


C8orf38 


3.87e-02 


1.02e-02 


1.00e+00 


4.08e-02 


rsl 554522 


17 


25913172 


KSR1 


4.36e-02 


1.45e-02 


1.00e+00 


2.13e-01 



6 summary 

We presented an empirical Bayes replicability analysis. Three comments about our 
implementation of the suggested approach follow. First, we have taken an of-the-shelf 
package, locfdr, to estimate the alternative densities, and the null density of the z- 
scores was assumed to be standard normal. In order to get stable estimates, we binned 
the z-scores and estimated the alternative probabilities of being in the different bins. 
The empirical Bayes analysis can be used with other probability or density estimation 
procedures that may be more appropriate for a replicability analysis in applications 
other than GWA studies. Second, in our analysis we assumed that the null z-scores 
were standard normal, but the locfdr package allows for several different null types, 
and other null types may be used. Third, we excluded studies for which the locfdr 
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Table 5: The estimated posterior probabilities for each configuration h, conditional on 
the binned z-score of z, for two example z-scores. 



h z = (-8. 


8,-4.5,-4.4,-7.5) 


,? = (-3.4,-4.9,-0.12,-2.8) 


(-1,-1,-1,-1) 


0.980 


0.000 


(-1,-1,0,-1) 


0.012 


0.924 


(-1,-1,0,0) 


0.000 


0.047 


(-1,0,-1,-1) 


0.008 


0.000 


(-1,0,0,-1) 


0.000 


0.004 


(0,-1,0,-1) 


0.000 


0.024 


(0,-1,0, 0) 


0.000 


0.001 



estimate of the proportion of null hypotheses was one, but an alternative approach 
to consider is to ignore the estimated proportion of null hypotheses, and proceed to 
estimate the non-null distributions and incorporate these studies in the analysis. 

The accuracy of the empirical Bayes analysis relies on the ability to estimate well 
the unknown parameters. We demonstrated in simulations that the variability of the 
FDP decreases as the number of hypotheses increased. In a simulation of realistic 
GWA studies we demonstrated that the empirical Bayes analysis produced inferences 
with a small FDP, despite the dependency among the p-values within each study. A 
full Bayesian approach to the problem of GWA studies replicability is not possible, 
since we do not know the true likelihood, and in order to estimate the probabilities of 
each of the 3" configurations of null and non-null hypotheses, we used the product 
of the marginal SNP likelihoods. In applications were the exact likelihood is known, 
it is possible to use a full Bayesian approach, so that the suggested framework for 
replicability analysis can be extended to provide estimates of the uncertainty of the 
Bayes FDR estimates. 

From a comparison of an empirical Bayes analysis with the partial conjunction ap- 
proach, we see that they may give similar inferences when the analysis is aimed at 
discovering associations. However, for replicability the empirical Bayes analysis dis- 
covers many more replicated associations than using the partial conjunction approach. 
The low power of the partial conjunction approach is due to two main limitations. First, 
the p-value is not the optimal test statistic. Second, it has a distribution that is stochas- 
tically much larger than the uniform, resulting in the BH procedure being conservative. 
The empirical Bayes replicability analysis does not suffer from these limitations, since 
it estimates the optimal rejection region by estimating the joint densities of the z-scores 
across studies and the probabilities of the various configurations of association status 
for every SNP. 
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A Proof of Proposition 2.1 

Proof. Let 4>or{z) be the indicator of whether z G Z r, and let (f>(z) be the indicator 
of whether z£2 for another rejection region that satisfies Fdr(Z) < q. Straightfor- 
ward calculus shows for every z 

<t>(z)(l - fdr(z)/t(q)) < <fa R (z)(l - fdr(z)/t(q)) (9) 
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Taking expectations on both sides of equation (0, 



/ (<j>(z)(l - fdr(z)/t(q)))f(z)dz < J [4>or{z){1 - fdr(z)/t(q)))f(z)dz, 

we receive the following expression: 

P(Z)(1 - Fdr{Z)/t{q)) < P(Z OR )(l - Fdr(Z OR )/t(q)) (10) 

Since Fdr(Z) < Fdr(2 R) < t(q) it follows that (1 - Fdr(Z)/t(q)) > (1 - 
Fdr{Zon) /t(q)) > 0. Therefore, the right hand side of expression ( fT0T > is smaller 
than P(Z r){1 - Fdr{Z)/t(q)) and the result follows. □ 



B Supplementary information on the replicability anal- 
ysis of T2D GWA studies 

Figure [4] shows the empirical z-scores, as well as the estimated conditional densities, 
for each of the six studies, as outputted from the locfdr package. 

The table below gives the list of the 219 SNPs with replicated associations, as dis- 
covered by the empirical Bayes analysis, sorted by positions on the chromosome. The 
positions were found by NCBI build GRCh37.p5 reference assembly, and they were 
mapped to nearby genes by dbSNP 

( |http : / /www . ncbi . nlm . nih . gov/pro jects/ SNP/ dbSNP . cgi?list=rslist| l 
The table shows the estimated Bayes FDR for replicability analysis as well as for the 
analysis to discover association, and the adjusted p-values from the corresponding par- 
tial conjunction analysis based on all six available studies. The majority of SNP dis- 
coveries of replicated associations were in chromosomes 6 and 59, were 68 and 59 
replicated associations were discovered respectively. 





chr 


pos 


gene 


rep.lfdr 


assoc. If dr 


rep.pc 


assoc. pc 


rs 10923931 


1 


120517959 


NOTCH2 


1.34e-02 


2.70e-03 


1.00e+0() 


3.45e-04 


rs6442307 


3 


12143355 


SYN2 


4.43e-02 


2.74e-02 


1.00e+00 


4.89e-02 


rsll715886 


3 


12147236 


SYN2 


4.40e-02 


2.73e-02 


1.00e+00 


4.89e-02 


rs4488811 


3 


12182028 


SYN2 


3.74e-02 


2.36e-02 


1.00e+00 


4.21e-02 


rsl 1721223 


3 


12185160 


SYN2 


3.67e-02 


2.35e-02 


1.00e+00 


4.11e-02 


rsl 1708978 


3 


12188495 


SYN2 


3.64e-02 


2.34e-02 


1.00e+00 


4.06e-02 


rs6792867 


3 


12189900 


SYN2 


3.77e-02 


2.37e-02 


1.00e+00 


4.02e-02 


rs7629805 


3 


12192394 


SYN2 


4.79e-02 


3.15e-02 


1.00e+00 


5.10e-02 


rsl0433537 


3 


12198485 


SYN2.TIMP4 


3.60e-02 


2.33e-02 


1.00e+00 


3.86e-02 


rsl 3070993 


3 


12217797 


SYN2 


3.70e-02 


2.35e-02 


1.00e+00 


3.69e-02 


rsl 1720578 


3 


12267084 


$NO. GENES 


4.33e-02 


2.68e-02 


1.00e+00 


4.81e-02 


rsl3071168 
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